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Abstract 


We introduce the Variational Holder (VH) bound as an alternative to Variational Bayes 
(VB) for approximate Bayesian inference. Unlike VB which typically involves maximization 
of a non-convex lower bound with respect to the variational parameters, the VH bound 
involves minimization of a convex upper bound to the intractable integral with respect 
to the variational parameters. Minimization of the VH bound is a convex optimization 
problem; hence the VH method can be applied using off-the-shelf convex optimization 
algorithms and the approximation error of the VH bound can also be analyzed using tools 
from convex optimization literature. We present experiments on the task of integrating a 
truncated multivariate Gaussian distribution and compare our method to VB, EP and a 
state-of-the-art numerical integration method for this problem. 

1 Introduction 

Many Bayesian machine learning problems involve an intractable sum or integral, for which nu¬ 
merical approximations methods have been derived. Approximate Bayesian inference techniques 
can be broadly classified into sampling-based (e.g. Markov chain Monte Carlo) and optimization- 
based (e.g. variational Bayes, expectation propagation) methods. While sampling techniques are 
widely used to explore the space and compute the statistics of interest for the problem, they are 
not always satisfying due to their stochastic nature and it is hard to assess convergence. 

Many algorithms involve the computation an objective function, such as a loss function, 
a negative log-likelihood or a energy criterion. However, the objective function itself often 
includes sums that are slow to compute, requiring the approximation of this sum. This is the 
case in empirical Bayes method (a.k.a. type-II maximum likelihood), mixture models with a 
latent state space such as high-order hidden Markov models and restricted Boltzmann machines, 
or even a simple Maximum Likelihood (ML) with fully observed data: the ML estimator of 
exponential family models with non-standard feature functions requires the computation of the 
partition-function, which is intractable as soon as the feature functions or the parameter space 
do not belong the restricted class of tractable models, including Gaussian distributions and 
tree-structure graphical models for non-Gaussian distributions. For other models, the partition 
function needs to be approximated and a full set of approximate inference algorithms have been 
designed during the last decades in including pseudo-likelihood approaches (Gourieroux et ah, 
1984), but these approaches do not show good empirical performances and do not really help to 
predict the likelihood of the observations. For other approximation schemes based on mean field 
approximations, obtaining algorithms with provable polynomial-time convergence guarantees 
and other theoretical guarantees is hard in general (Wainwright and Jordan, 2008). 

In Bayesian statistics, many deterministic inference approaches have been proposed, the 
main ones being Variational Bayes (VB) (Williams and Hinton, 1991; Jordan et al., 1999; At- 
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tias, 2000), Expectation-Propagation (EP) (Minka, 2005), and Tree-Reweighted sum-product 
(TRW) (Wainwright et al., 2005). For continuous variables, classical approximate inference 
schemes are based on EP or the Variational Gaussian (VG) representation, which is basically 
the information inequality applied to the Gaussian case (Challis and Barber, 2011). However, the 
VG bound is known to be a crude inequality which tends to under estimate the variance, leading 
to poor results in situations where variance estimates are crucial, for example in Bayesian ex¬ 
perimental design (Seeger and Nickisch, 2011). More interestingly, Liu and Ihler (2011) showed 
that new inference algorithms can be obtained by minimizing the generalized Holder’s inequality 
applied on the partition function of a discrete graphical model. Such algorithms do not suffer 
from the zero-avoiding behavior of VB and the lack of convergence guarantees of EP, and has 
strong connections with the TRW convex upper bound to the partition function. 

In this work, we introduce the Variational Holder (VH) inequality, a family of tractable upper 
bounds to the product of potentials, possibly defined on a continuous space, unlike previous work 
focusing only on the discrete case. Hence, our bound generalizes earlier work by Liu and Ihler 
(2011) and is simpler in construction. We show that we can infer continuous latent variables 
values in a Bayesian inference problem where the unnormalized integral is a product of two 
potentials corresponding to the prior and likelihood respectively. The optimization with respect 
to the variational parameters in VH is a convex optimization problem and can be solved using 
off-the-shelf tools. We compare the performance of our method to VB, EP and a state-of-the-art 
numerical optimizer on the task of integrating a truncated multivariate Gaussian distribution. 


2 Variational Holder bound 

Notations We define a probabilty space (II, v) where H is a sample space and T a sigma- 
algebra defined on it. Let Z be a redistributed random variable taking values in a Hilbert space 

Z. We make use of C p norms ||.|| p repeatedly, where p > 1 and ||/|| p := (/ \f(Z)\ p dv(Z)) v for 
P < oo and ||/||oo := sup z \f(z)\. Let: 

:= J 7l (Z) 72 (Z)drz(Z) = || 7l72 || 1 (1) 

be the integral we want to approximate, also called the partition function of the unnormalized 
distribution with density 'y(Z) := 7 i(Z) 72 (Z). 

Upper bound to the partition function We define the following functional: 

4($) := ||7i*IUl72/’3'||a 2 (2) 

where the argument of I a is a positive function $ : Z 4 R + which we refer to as the pivot 
function and a = (cti,a 2 ) £ M +2 . The main result of this paper is the study of a new in¬ 
equality to the log-partition function, that we call Variational Holder (VH) inequality because 
it corresponds to a direct application of the well-known Holder’s inequality: 

Theorem 1 Let 7 i and 72 be two positive measures defined on Z. The following inequality: 

i* < i a m , ( 3 ) 

holds for any positive scalars a\ and a 2 such that ^ = 1 o,nd any function T : Z > R + . 

Equality holds if for almost all z £ Z, 'F(^) = 7 i (z) - “ 2 72 (z)“i~. 
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PROOF. The bound (1) is obtained using Holder's inequality ||/<?||i < ||/|| ai ||<?||a 2 f or f 

_ _ 1 _ 

7 i'& and g <r- 72 /it. The tightness result is given by a direct calculation: I a { 7 i “ 2 7 2 c ' 1 ) = 

1 - — JL i_JL JL JL ' 

Il 7 i “ 2 7 2 ai II«iI| 7 i (-)“ 2 72 ||a 2 = II (7172) 0,1 IUi II (7172) ||«2 = Il7i72|ir i Il7i72||“ 2 = II7172II1 = 

I*. □ 

The key insight in the VH bound over the standard Holder bound is that we will choose 
the pivot function it so that the bound is as close as possible to the target integral. The upper 
bound on the right-hand side has several useful properties. The first one is that the upper 
bound can be tractable even if the original quantity 11 7172 11 1 is intractable. The second useful 
property of the log of the bound is convex in log(\I/), which makes it convenient to optimize, 
and in particular using gradient descent methods that are provably convergent in polynomial 
time. Finally, this bound has theoretical properties that make it suitable for approximating 
distributions, as shown in the next section. 


3 Theoretical Guarantees 

In this section, we show that under mild conditions, the VH bound is good for variational 
inference; when the upper bound is close to the target partition function, then the resulting 
approximation is also close to the target distribution p* := Tr 1 - 

Proposition 1 For any e > 0, the inequality /*>(! — e)J a ('k) implies that: 


(7i^) ai 

* 

~P 

< V2~e + e 

2 

if a.\ < 2 , and 

(4) 

1171*112; 

(72/*)“ 2 

172/*||S2 

* 

~P 

< y/2e + £ 

2 

if ol\ > 2 . 

(5) 


The proof is given in the appendix for clarity, but it is novel and is one of the key contributions 
of the paper. Proposition 1 shows that the smaller the relative gap of the VH inequality is, 
the better the functions ( 7 i’I') ai or ( 7 2 /*) a2 can approximate the target distribution p *. This 
approximation is useful when 7172 is hard to integrate, but ( 7 i v k )“ 1 and ( 72 /'J / ) a2 are easy to 
integrate. 

Now that the approximation properties of the VH bounds have been highlighted, we describe 
how to effectively use these results in practice. 


4 Holder Variational Bayes 

Based on the previous results, we obtain a variational algorithm to approximate product of 
factors by tractable factors. To do that, we choose the pivot function in a properly chosen 
tractable family T = {\f r (.;r),T € T} where T is the set of variational parameters that defines 
the family. Then, we obtain estimates for a\ and r by minimizing 1 the VH bound (2) over T\ 

(f,di) G arg min/„($(., ;t)) (6) 


Once the optimized values (f, 07 ) have been found, the approximation to the exact in¬ 
tractable distribution p* are given by: 


Pi{ z ) : = 


( 7 l (Z)tt(Z;r))°i 


(7) 


1 We optimize ( 6 ) with respect to logit (l/cni) instead of a i, where logit (it) = log(j3^-), since the former is an 

unconstrained minimization whereas the latter is a constrained minimization problem. 
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and 


P2{Z) := 


( 72 (Z)/^(Z;f))°=» 

Il72/*(.;r)||g3 


( 8 ) 


Other moments can be computed in a similar fashion. Note that by choosing the proper ap¬ 
proximating family J r , both distributions are assumed to be tractable, i.e. we can compute 
their normalization constant efficiently. According to Proposition 1, one should choose p± or p 2 
depending whether ai is smaller or greater than 2 , but we also considered a convex combination 
of these two tractable distributions. This amounts to using the following mixture model: 

Pi 2 (Z) := —pi(Z) + —p 2 (Z) (9) 


as an approximating distribution. In their seminal paper, Liu and Ihler (2011) also minimized 
the Holder’s bound with respect to parameters and exponent values, but their approach is 
restricted to discrete graphical models, and they applied this idea in the framework of the 
bucket elimination algorithm. 

So far, the VH bound is very general. It can be applied on discrete and continuous spaces. 
The sole assumption we made is that p* £ C\, 71 r) £ C ai and 7 2 v I'(:, t) £ C a2 for any r £ T 
and any ak in the range of Holder’s exponents that are considered. We now turn on to a specific 
class of functions to illustrate how the VH bound is used in practice. 


5 Application: Gaussian Integration 

5.1 Problem Definition 

Using the notations of the previous section, we define 71 (t) = J |” =1 fi{U) where each function 
fi : R. 1 —> R is univariate. We also define 7 2 {t) with At+b where A is a symmetric n x n 
matrix and b £ R n . We use the Lebesgue measure for v. Assume we want to evaluate: 

r. n 

/*:=/ Y[MU)e-i tTAt+bTt dt . ( 10 ) 

This type of integral is common in machine learning (Seeger, 2010). Typically, this corresponds 
to the marginal data probability — a.k.a. the evidence — of a linear regression model with known 
variance and sparse priors, n being the number of variables. Up to an affine change of variable to 
obtain orthogonal univariate factors, this integral corresponds also to the data evidence of in a 
generalized linear model with Gaussian prior, where n is the number of independent observations. 
The functions 71 and 72 alone are easy to integrate. This remain true when they are multiplied 
by a Gaussian potential with diagonal covariance matrices, so that we can choose the following 
variational family {'k(t;r),T £ R 2 ”}: 

j e -it r diag( Tl )t+x 2 r t 5 ^ g g R nj _ (U) 

A approximation to the integral (10) is obtained by minimizing the upper bound I a given in 
Equation (2). 
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5.2 Integration of orthogonal univariate function 

The first term in the bound || 7 i'f'(., x)|| ai can be obtained efficiently in terms of univariate 
integrals: 


TL r, 2 

Il7i*(.,r)||£ = II / 

i=i 

n 

= W_U] L f i \{Tu,T2i,a i) , 

i=l 


where the univariate integrals U[h] ■ R 2 x [0; 1] i—>- M. are defined as 


( 12 ) 


U[h] (a, b, oq) := j (h(t)e~^ t2+bt ) ai dU ■ (13) 

Here, h is an arbitrary univariate function R H > R + . These integrals can be efficiently computed 
using quadrature integration (e.g. recursive adaptive Simpson quadrature), but in many prac¬ 
tical applications, the same functions /, in Equation (12) are used for many factors. A consid¬ 
erable speedup can be obtained by designing integrals dedicated to some functions (in practice, 
using pre-computed functions with linear interpolation is 10 to 100 times faster than running 
a new quadrature every time). One important special is the step function: fi{x) = I.r x >o} 
for all i £ {l,--- ,n}. For this function and using Gaussian pivot functions as specified in 
Equation (11), we obtain a closed form expression in terms of normal CDF function >!>: 


U u 


*{•><>} 


(a, 6, a) = 



(14) 


If there is no truncation in some dimensions, the constant one function gives: U[ i] (a, b , a) = 



5.3 Gaussian Integration 

Concerning the other factor ||7 2 /4'(.,T)|| a2 , its log-quadratic form corresponds to a standard 
Gaussian integral: 

1172 /'!'(•• T )|| a2 = / e - ^ tT ( A-diag ( Tl ^ t+ “ 2 ( b- ' r2 ) Tt dt = (27r)Se J ( a2 ^ -diag (' ri ^’ a2 ( b- ' r2 ^, 

012 J R" 

where J(M,v) := —^log|Af| + \v T M~ 1 v. 

5.4 Truncated multi-variate Gaussian integration 

Most of the truncated multivariate Gaussian integration problems with linear truncations can 
be put under the canonical form (10) where truncations are orthogonal, 2 i.e. fi(x) = I{ x >o} for 
all* £ {1, • • • , n}. Integrating truncated correlated Gaussian is a known open problem for which 
several approximation techniques have been proposed. In numerical approximation, adaptive 

- In their general form, truncated Gaussian integration problems are based on the estimation of 
j yn ] [j 4- 0 At+b * dt. but a change of variable a; + /f,t, -4 t, leads to the canonical form 

(10) if there is no parallel truncation lines, i.e. box constraints. Box constraints can be handled by a simple 
modification involving two-sided univariate truncations. 


5 



function y y 2 (black) 
truncation (red) 












0 2 4 6 8 
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0 2 4 6 8 

L = 4.68 




l/o^=0.70 



1/a 2 = 0.30 



L = 5.01 






log(Z) = 12.98 <0.00+13.16 1/a^O.OO 1/tx 2 = 1.00 L = 12.61 


Figure 1: 2D Truncated Gaussian integration. Each row represents a different correla¬ 
tion/truncation setting. From left to right, the columns show 1) the target function 7172 , 2) its 
first tractable approximation ( 7 i’P )“ 1 (product of orthogonal univariate function), 3) its second 
tractable approximation (72 /dr )" 2 (correlated Gaussian distribution) and 4) VB approximation. 
Symbols ’x’ and ’+’ are the exact and approximate means, respectively. 


quadratures approach have been well investigated (Genz and Bretz, 2009), but are still limited 
to small dimensions. Approximate inference techniques, such as Expectation-Propagation (EP), 
have been recently proposed, but the algorithm remains unstable, even after specific improve¬ 
ments to increase the accuracy of the method (Cunningham et al., 2011). One of the reason is 
the fact that EP does not give any guarantee about the approximation. To be used in a learning 
framework, upper and lower bounds to the integral (10) are often very useful. We focus here on 
the upper bound . 3 

Initialization We need to initialize parameters so that the integral is tractable. This is 
not always trivial, but in principle, any point in the convex set T = {{ti ,tJ) t £ R 2n |0 -k 
diag (17) -< A} leads to a finite integral. For example, setting tu to half of the minimum Eigen 
value of A lies within the convex set. 

3 Lower bounding is not straightforward, since the classical approach to obtain lower bounds is based on 
the information inequality requires a class of approximation which is is contained in the support of the target 
distribution, and this is not the case for multivariate Gaussian distributions. 
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Bound minimization After simplification, we get the following overall objective for the upper 
bound of a multivariate truncated Gaussian: 

— Viog [/[/.](Ti i ,r 2i ,ai) + — J(a 2 (A- diag(ri)),a 2 (&- t 2 )) - ^log(27r). (15) 

i 

Figure 1 presents results on a two dimensional truncated gaussian integration problem. The 
optimal value of a.\ depends on both the level of truncation and the correlation. 


6 Comparison with Variational Bayes 

6.1 Variational Holder vs. Variational Bayes 

One can compare the VH inequality (3) to the one provided by the Variational Bayes (VB) 
inequality: 

log/* > J log ll (z)dq(z) + J log l2 {z)dq{z)+U{Q) . (16) 

for any distribution Q absolutely continuous with respect to v, where Ti. denotes the information 
entropy and q = ^. 

VB provides a lower bound to the log-sum-exp function, while the VH provides an upper 
bound. One disadvantage of the VB bound (16) is that it is not concave in general, leading 
to objective functions that are difficult to maximize and a bound that does not come with 
theoretical guarantees. Another disadvantage is that the approximating distribution Q must 
have a support included in the base distribution v, which is not always convenient when the 
target distribution has subspaces with zero probability. This zero-avoiding effect of the VB 
bound can lead to crude approximations of the original integral (Minka, 2005). 


6.2 Variational Bayes for truncated multi-variate Gaussian integration 


As a comparison, we consider in this section the VB approach, that gives a lower bound to the 
likelihood. The key idea to be able to apply VB on this problem is to consider independent 
truncated Gaussian for the approximation family. We start with (10): 

P n 

log/* := log / n fi{ti)e-* tTAt+bTt dt>L (17) 

-' R ’ 1 i=i 

where L denotes the negative variational free energy, a.k.a. the variational lower bound, given 

by 

n 

L:= ^2 E q [ lo S/»(**)] - 2 tr (^ E ? [ ttT ]) + bTE q [t] + y.[q] (18) 


We choose the variational distribution q to be a product of univariate truncated normal distri¬ 
butions which are truncated at zero. Let q = , cq) ■ The variational bound is given 

by 


L = tr(AE 9 [tt T ]) + b r E q [t] + | log(2?re) + ^(log(cr, ; $(^-)) 


-exp(-i4) 

2v&i$(£) 2 a}’ 


E g [t 2 i] = Vi + cl + TT,. exp(— and E g [t^] = E 9 fe] E ? [tj] Vi ^ j. 

V27r$(e:) za i 


( 19 ) 
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Minimization of the lower bound could be done iteratively by solving one-dimensional truncated 
Gaussian fits in a round-robin fashion, however, in the experiments below, we computed the 
gradient of the variational objective and used a gradient descent technique to find the optimal 
variational parameters. 


7 Experiments 

To undestand the properties of the VH bound, we compared its properties with existing deter¬ 
ministic integration, integration methods in high dimension. We considered the ground-truth to 
be the method of Alan Genz (Genz and Bretz, 2009) which is based on a sophisticated technique 
of pseudo-random number generation. The main interest is that it can give an error estimate 
of the error, so that we can evaluate precisely the validity of various techniques. We used the 
matlab code provided by the author. Another efficient technique for integration of truncated 
Gaussians is based on the use of Expectation-Propagation (EP), as described by Cunningham 
et al. (2011). In this case as well, a matlab code is provided by the authors. Finally, we used 
implemented the VB version described above to obtain a lower bound to the true integral. 
Both VB and VH objectives were minimized using the L-BFGS algorithm provided by matlab 
fminunc function. 

We used multiple correlations settings, where the precision matrix A was obtained using the 
following rule: A := nl + v * v T , where v is drawn from a n-dimensional Gaussian distribution 
with unit covariance matrix. We varied the correlation by setting k £ {0.1,1} and the dimension 
by varying n £ {5,20,50}. We also compared the accuracy of the moment computation, since 
large gap in the bound does not always imply large difference in the results. The method of 
Genz did not output moments, so we also compared the accuracy of the moment computation 
by computing the Euclidean norm of the difference between the three methods: EP, VB and 
VH for the mean of the target distribution. 

Table 1 gives the results. The first 4 columns compare the integral values. We can see that 
VB correctly estimates a lower bound to the true integral, and that VG consistently gives an 
upper bound. EP seems to be generally very accurate, sometimes over-estimating, sometimes 
under-estimating the exact integral. An interesting phenomenon is that Holder is more accurate 
than VB in the high correlation setting (k = 0.1. This is expected since VB is unable to use 
correlation due to the fact that the approximating family is composed by independent truncated 
Gaussian variables. 

When comparing the moment computation, we see that Holder can give very accurate results, 
even if the gap in the bound was large. We also see that the higher the dimension, the better VH 
becomes with respect to VB. We also notice that the high correlation setting, VH and EP are 
closer to each other, compared to VB suggesting again, that high correlation are well handled 
by the VH approximation. 


K 

n 

Genz 

EP 

VB 

VH 

0.1 

5 

5.1499 

5.1327 

2.9489 

6.4169 

l 

5 

0.41768 

0.41234 

0.10715 

0.98725 

0.1 

20 

24.7689 

24.7702 

17.5524 

28.2854 

i 

20 

1.9203 

1.9196 

0.97199 

2.8037 

0.1 

50 

66.2055 

66.1991 

44.3699 

68.971 

1 

50 

9.9 

9.8999 

5.9196 

11.2919 


K 

n 

VB vs EP 

VH vs EP 

VB vs VH 

0.1 

5 

1.9286 

0.35811 

1.7518 

i 

5 

0.12856 

0.073666 

0.19735 

0.1 

20 

3.0963 

3.3148 

2.9741 

1 

20 

0.19727 

0.074579 

0.27093 

0.1 

50 

5.5944 

0.72705 

5.9132 

1 

50 

0.51551 

0.076875 

0.58154 


Table 1: Comparison of log partition function (left) and error in first moment (right) 



















8 Generalization for many factors 


Here, we consider the more general case where the integral to compute is the product of K 
factors, K > 2: I* := f 11^=1 fk(z)dv(z) where /*,, k = 1, • • • , K are the individual factors. We 
have the following results: 


Theorem 2 The following inequality: 


I* 



®k'(z)dv{z) 


1 

a k 


( 20 ) 


holds for any a = (oq, • ■ • , oik ) £ (0, oo)^ such that jjy = 1 and any function : Z i-> 

R + in C ak , k = 1, • ■ • , K. 


PROOF .(sketch) Similarly to the generalization of Holder’s inequality for a product of functions, 
we can apply the binary VH bound recursively. □ 

One can verify that we recover the results of Section 2 for I\ = 2. The VH method can be 
obtained by parameterizing the pivot functions and minimizing (20) with respect to the pivot 
functions Hr and a.. Tightness and approximation properties studied in Section 3 can also be 
extended to the case of multiple factors. 


9 Discussion 

We have introduced a new family of variational approximations that are based on the mini¬ 
mization of an upper bound to the log-partition function. We demonstrated that the variational 
inference problem is convex if the variational function is log-linear, which has great practical and 
theoretical advantages over mean-held/VB approximations, which is the main approach used 
today by practitioners. We also provide a novel way to handle Gaussian integration problems. 
In fact, we could express probit regression as a special case of this problem, and the extension 
to other distribution is possible in theory. One of the unique feature of this approach is that 
the approximation maintains the heavy tails, but we still a convex objective. Further experi¬ 
ments will be conducted to evaluate how good this approximation behaves in Bayesian posterior 
estimation. 

The VH framework presented here is very general, and can be applied to many models and 
optimized using a large variety of algorithms and speedup tricks, similarly to what happened 
with VB and EP other the last two decades. 

We focused mainly on one type of intractable integrals that is common in machine learning 
problems (GLM or linear models with sparse priors), but the approach is generic and could 
potentially be applied in many other settings. One of the main area of application is the inference 
in graphical models with discrete variables, on which the TRW sum-product algorithm has been 
designed (Wainwright et ah, 2005), as well as several other algorithms dedicated to discrete 
graphical models (Liu and Ihler, 2011). It also provides an upper bound to the log-partition 
function and is convex if the tree-weights are known. An alternative proof to the TRW bound 
based on the Holder inequality was given by Minka (2005), and we conjecture that the TRW 
bound could be expressed as a special case of the proposed approach, for example by assuming 
that there is one factor per possible spanning tree. 
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Appendix 

Pre-requisites 


In the following, the symbols / and g represent v —measurable positive functions, and p and q 
are positive scalars such that ^ ^ = 1. 

Lemma 1 For any e > 0, the inequality ||/g||i > (1 — £)||/||p||ff||g implies that: 


f p 

fg 

ll/ll? 

WfWrMq 

g q 

fg 

h\\l 

WfWrMU 

that \\f\\ p 

< ll^ll? and 


< V2e if p < 2, and 

< V2 e if p > 2 


( 21 ) 


< 


fp 


fg 


fW p P WfU\g\\ q 

1 

nE 

f 2 

( 

£ E 

f 2 

Mg V 

ll/lll 

VI 

/III 

ll/llp _f ||<?llj 

/* 


Z 1 

Mg 

ll/lll 

2 

ll/lll 

IIZ||p _ ^ llffllg 


=i 


by Cauchy-Schwartz inequality. We can expand the square of the right-hand term in the product: 


z f 

z 1 

2 


IZIlI 

ll/[£~ 5 ||$ll. 

2 


£ E 

f 2 

O IIZ<7||i 


Z 1 *g 

iizii! 

2 “Mrhh ' 

IIZIIp”® IMIg 


>l-£ 


:=A 


( 22 ) 


where we denote by A the quantity: 

M 


A := 


g 


IIZIIpVIV 


I/ 2 -VI 


IlM 


11/llVllflll 


Assuming p < 2, we can now bound A by using Holder’s inequality with exponents p' = and 


q' = q . One can verify that the pair ( p',q') is a valid Holder’s exponent: q' > 1, p' > 1 and 


l 7 + l 7 = ‘kzP + l = l + l- l = 2-l = l. We obtain: 

v q p q p q 


f 2 -%\W%, _ \\f\\l~ p \\g\\ 


2-p 


A < 


ll/llMffll? 


Il/Ilp -P llsll 2 


= 1 


This results proves that Equation (22) is upper bounded by 2e, so the Equation (22) leads to the 
following inequality: 


fP 

fg 

WfW P r 

WfWpWgWq 


< V2s 


Equation (21) follows by symmetry. 


(23) 

□ 
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Theorem 3 For any e > 0 ; the inequality \\fg\\i > (1 — e)||/||p|M|g implies that: 

< V2e + e if q>p , and 


PROOF. 


fp 

fg 

ll/llp 

Wf9\\i 

g q 

fg 

Ml 

ll/slli 

fP 

fg 

ll/ll? 

\\fg\\i 


< V2e + e if p > q 


< 


P fg 

+ 

1 

fg 

fg 

ll/llp II/IIpNI 9 

\\fV\g\U 

\\fg\\i 


<.y/ 2 e by Lemma 1 


<£ 


( 24 ) 


Equation (24) follows by symmetry. □ 

Proof of Proposition 1 

We are now ready to obtain the proof for the bound approximation property: 

PROOF. Apply Theorem 3 with f := 7i4/, g := 72 /W, p := a\, q := 02, I* := ||/g||i and 
: = Il7i« , l|p||72/«'|| 9 - □ 
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